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Abstract 



Ch ' We prove a theorem for coding mixed-state quantum signals. For a class 



of coding schemes, the von Neumann entropy S of the density operator de- 
scribing an ensemble of mixed quantum signal states is shown to be equal to 
$— i ' the number of spin-1/2 systems necessary to represent the signal faithfully. 

This generalizes previous works on coding pure quantum signal states and is 
analogous to the Shannon's noiseless coding theorem of classical information 
theory. We also discuss an example of a more general class of coding schemes 
which beat the limit set by our theorem. 
PACS numbers:03.65,05.30,89.70 
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A key concept in classical information theory developed by Shannon |lj and others @ is 
the entropy. For a discrete random variable (source) A, it is defined by 

H(A) = -^2p(a)log 2 p(a). (1) 

a 

Coding is an important issue in information theory. In particular, one may be interested in 
representing the messages produced by the source A by a sequence of binary digits (bits) 
as short as possible. Suppose that A emits a seqence of independent messages. If we allow 
ourselves to code entire blocks of independent messages together and tolerate an arbitrarily 
small error in the signals reconstructed from the coded version, it turns out that the mean 
number of bits per message needed can be arbitrarily made close to H(A). 

Recently, there has been much interest in the subject of quantum computation. Cur- 
rent investigations include the physical implementation of quantum computers, quantum 
complexity theory, quantum teleportation and quantum coding. In quantum coding, Schu- 
macher (U and Jozsa and Schumacher || have considered the possibility that the signals 
are pure quantum states which are not necessarily orthogonal to one another. Suppose that 
a quantum source A emits a sequence of independent signals, each of which is a pure state 
from the list \a%), • • • , \a m ) occuring with probabilities p%, ■ • • ,p m . We may associate the 
density matrix 

m 

P = ^2Pi\ai)(ai\ (2) 
i=i 

to the source. By analogy with the classical measure of information, the bit, as a 2-state 
classical system, Schumacher used the term "qubit" (meaning quantum bit) for the quantum 
state storage capacity of a two-dimensional Hilbert space. Note that, unlike a classical bit 
which can only take on a value of either or 1, the state of a qubit can be in some coherent 
superposition of and 1. i.e. the state of a qubit \u) = a\0) + /3\1) where a, (3 e C and 
\a\ 2 + \j3\ 2 = 1. Moreover, a qubit is capable of being entangled with the states of other 
qubits. For example, the state 4=(|10) — 1 01 } ) is allowed. The polarization of a single 
photon, for example, has a storage capacity of one qubit. We wish to encode the signals 
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with a least possible number of Hilbert space dimensions. Once again, block coding may be 
used and a small error may be allowed. In other words, we consider a i^-blocked version 
of A. If A has m distinct signal states in a Hilbert space H n (of dimension n), then Ak has 
m K signals in H n K (of dimension n ). In order to code the signals with a minimum number 
of Hilbert space dimensions, typically part of a system will be discarded during the coding. 
Therefore, the signal |a») is reconstituted as a mixed state with density matrix W{. In Refs. 
|§] and || the concept of fidelity 

F = ^2p i (a i \W i \a i ) (3) 

i 

was introduced. Notice that (a$|Wj|aj) is the probability that the state Wi passes the yes/no 
test of being the state a { . < F < 1 is the average probability of passing the test. 

Analogous to the classical information theory, we introduce the von Neumann entropy 

s (p) = -Trplog 2 p. (4) 

The quantum noiseless coding theorem for pure states proved in Refs. jp and || states the 
following. Given any quantum source with von Neumann entropy S(p) and any e, 5 > 0, 

(a) If S(p) + 5 qubits are available per signal, then for each sufficiently large N, there 
exists a coding scheme with fidelity F > 1 — e for signal strings of length N. 

(b) If S(p) — 5 qubits are available per signal, then any coding scheme for strings of 
length N will have a fidelity F < e for all sufficiently large N. 

Therefore, the von Neumann entropy may be interpreted as the minimal number of qubits 
needed for reliable (almost noiseless) coding. This noiseless coding theorem works only for 
pure signal states. It is natural to generalize it to consider signals which are mixed states 
n a , with p = J2 a P( a )^a,- As noted in Refs. and f|, it is not clear how to proceed. A 
naive generalization of the fidelity, 

F = ^p(a)Trn a H/ a (5) 

a 

is not close to unity even when W a = U a for all signals. To quantify the amount of distortion 
of a particular coding scheme, a notion of the distance between two mixed states is desired. 
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Such a concept has been introduced by Anandan || in the study of geometric phases. Let D 
denote the set of density operators representing the states of a given quantum system. (T> 
consists of the set of Hermitian operators in the Hilbert space of this system with nonnegative 
eigenvalues and trace equal to 1.) V is a topological space with the pure states contained 
in its boundary. The set of pure states can be identified with the projective Hilbert space 
V. The inner product structure of a Hilbert space naturally induces a metric, namely the 
Fubini-Study metric on the projective Hilbert space V which can be extended into the rest 
of V. More concretely, the distance between two points p and p' in V is defined by 

s(p,p') 2 = 4(l-MiP')\ 2 ) (6) 

where and are two normalized states contained in p and p' . It is simple to check 
that s(p,p') satisfies all the axioms for a metric. Suppose that p and p' are separated by an 
infinitesimal distance ds in V: 

ds 2 = 4(1 - KVWD = 2T KP - P'f (7) 

where the last equality follows from Tr(p 2 ) = Tr(p' 2 ) = 1. This defines a Riemannian metric 
on V, called the Fubini-Study metric. It is therefore reasonable to introduce a flat metric 

dS 2 = 2Tr{dp 2 ) (8) 

on V. When restricted to the pure states, it becomes the Fubini-Study metric. 

Suppose that a quantum source produces a sequence of signals, each of which is a mixed 
state from the list n 1; • • ■ U m , with probabilities Pi,---,p m an d that after coding, the signal 
n a is reconstituted as W a . Motivated by the above discussion, we define the distortion 

D = Y,PaTr(U a -W a ) 2 . (9) 

a 

Notice that < D < 2 and D = if and only if n a = W a . This definition is reasonable 
because n a — W a is the deviation of n a from W a . To obtain a real-valued function, we take 
the trace. However, Tr(n a — W a ) is identically zero. It is, therefore, natural to consider 
Tr(n a — W a ) 2 and take the ensemble average. 
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The ensemble of signals emitted by the source can be represented by the density operator 



Consider the following communication scheme discussed by Schumacher Q. Suppose 
that the signal is represented by a system X which is composed of two subsystems, C (for 
"channel" ) and E (for "extra" ) . Only the channel subsystem C is transmitted to the receiver 
and the subsystem E is simply discarded. To recover (some approximation of) the signal, 
we add to the channel system an auxiliary system E' that is a copy of the discarded extra 
system E. Schumacher called such a communication scheme an approximate transposition 
via the limited channel C . For this type of communciation schemes, we have the following 
theorem: 

Quantum Noiseless Coding Theorem for Mixed States. For any quantum source which 
produces mixed signal states n a 's with probabilities PaS, define the von Neumann entropy 
S(p) as in Eq. (0). For any e, 5 > 0, 

(a) if S(p) + 5 qubits are available per signal, then for each sufficiently large N, there 
exists a coding scheme with D < e. 

(b) if S(p) — 5 qubits are available per signal, then for a sufficiently large N, any 
approximate transposition coding scheme for a string of length N has a distortion D > 



This implies that for a given quantum source, D will not tend to zero unless at least 
S(p) qubits are available per signal. Therefore, S(p) may again be interpreted as the mean 
number of bits needed for the noiseless coding of a source which emits signals that are mixed 
states if an approximate transposition coding scheme is used. 

To minimize our usage of resources, we would like to code signals on a <i-dimensional 
subspace A of 7i n . (In applying the following lemmas to prove the main theorem, we will use 
block coding. The signal states will therefore be f^-blocks of signals.) Let I&2), * • • , \bd) 
be a basis of A and |&d+i)> IW2), • • • , \b n ) a basis of A- 1 , the orthogonal complement of A. 
For each a, Il a can be diagonalized and expressed in terms of its eigenvectors as 




(10) 



a 



e. 
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(11) 



a, 



where IT ai = |a^) (a^l and for each a, ^2 a .q ai = 1- Suppose that, with respect to the basis 
1 61), \h), ■■, \bn), 



n„ 



\di)(ai 



1 ' M a . A\ ^ 



(12) 



where M a . is a d x d matrix. We now introduce an explicit coding scheme based on A. Let 
|0) be an arbitrary state in A and P the projection into A. For each IT a . = |dj)(aj|, we 
measure the observable P on |dj). If the result is obtained, then |0) is substituted for the 
post measurement state. In other words, we associate with each H"^ a density matrix 



W ai 



' ' M ai 0^ 



V 







;i-TrM a J|0)(0|. 



(13) 



Lemma 1. Suppose that the sum of the d largest eigenvalues of the density operator p 
is greater than 1 — £. Let A be the span of the d eigenvectors of p corresponding to the d 
largest eigenvalues. Then the association Il a = ^2 a .q a Jl ai < — > W a = ^2 a .qaiW ai , defined 
by Eq. (O) has distortion D < 2^. 



Proof: Note that f(X) = Tr(X 2 ) is a convex function. We have Tr[E(X)] 2 < E(TrX 2 ) 
where E(X) denotes the weighted mean of a variable X. Denoting Il a — W a by X a and 
II a - — W ai by X a ., the distortion 

D = J2PaTr(X a ) 2 

a 

a di 

Here convexity of the function f(X) = Tr(X 2 ) has been used. Now let P denote the 
projection operator into A, the space spanned by the d eigenvectors corresponding to the d 
largest eigenvalues of p. By assumption, 
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Tr(pP)>l-£. (15) 



Consider 



^^ Mfl ,Tr[n fl ,(l-P)] 

a 04 

= X)PaTrEg ai II a4 (l-P)] 
= J> a Tr[II a (l - P)] 

a 

= Tr[J>«II a (l - P)] 

a 

= Tr[p(l - P)] 

<e (i6) 



Notice that with Eqs. ( |14|) and ( jig) we have essentially reduced the case of mixed signal 
states to that of pure signal states with a priori probabilities p a Qai ■ In what follows, we shall 
therefore consider the case of pure signal states only. For simplicity, we also suppress the 
index a. Write |aj) in terms of its components in A and A -1 : 

\ai) = <Xi\li) + A|mj) (17) 

where a^j3i > , a 2 + (3 2 = 1, \k) G A and \rrii) G A- 1 . For IT; = |a^) (a^ | , we have 

ilj = a 2 \li)(k\ + ai(3i\li) (mi\ 

+ai(3 i \m i )(l i \ + /3f\mi)(mi\. (18) 

Ilj is associated to 

W i = a? i \l i )(l i \+P?\0)(0\. (19) 
It is then a simple exercise to check that 

^p i Tr(n i -^) 2 = 2^^ i 2 

i i 

= 2^Tr[II i (l-P)] <2£. (20) 



This completes our proof of Lemma 1. 
Lemma 2. Consider any coding scheme 

Ui < — >Wi i=l,---,m (21) 

where Wi is a density matrix supported on some d- dimensional subspace D of H n . If the 
sum of the d largest eigenvalues of p is rj, then the distortion D > ^PiTrn 2 — 2rj. 

Proof. Let us denote the projection into D by P' and the projection into the space 
spanned by the d eigenvectors with the d largest eigenvalues by P. By assumption, 
E^TrTLP'] < Tr[pP]=r]. 

D = Y t p i Tr[U i -W i ] 2 

i 

i i 

>^p l Trn i 2 -2^p l Tr[n i P'] 

i i 

> J>Trn ? 2 - 27?. (22) 

i 

Having proved the two lemmas, we proceed to prove the main theorem. For this, we 
make use of the "Asymptotic Equipartition Property (AEP)" (an analog of the weak law 
of large numbers) in classical information theory. The weak law of large numbers states 
that for independent, identically distributed (i.i.d.) random variables, jf^2f =1 Xi ^ s dose 
to its expected value E(X) for a large N. Functions of independent random variables 
are also independent random variables. Since X^s are i.i.d, so are log 2 p(Aj)'s. Apply- 
ing the weak law of large number to log 2 p(Aj)'s, we obtain the AEP, which states that 
jj log 2 xl ■■■ x N ) ^ s c l° se to the entropy H . Here X±, X 2 , • • • , X^ are i.i.d. random vari- 
ables and p(Xi, X 2 , • • • , Xjv) = p(Xi)p(X 2 ) ■ ■ -p{X^) is the probability of the occurrence of 
the sequence X 1 ,X 2 , ■ ■ ■ ,X N . Therefore, it is highly likely that the probability assigned to 
an observed sequence is close to 2~ NH . 

This enables us to divide the set of all possible sequences into two subsets, the set of 
"typical sequences" , where the sample entropy is close to the true entropy, and the atypical 
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set, which contains all other sequences. In classical noiseless coding theorem, we just choose 
our codewords in one-one correspondence with the typical set. In other words, we only code 
all the typical sequence. If an atypical sequence occurs, we accept failure. The important 
point is that the probability for a sequence to be in the atypical set is small as N gets large. 

Proof of the quantum noiseless theorem for mixed states, (a) Let Aj., A2, • • • , A„ be the 
eigenvalues of the density matrix p of a quantum source A. Consider Ai, A2, • • • , A n as the 
probabilities of a probability distribution V. The Shannon entropy H(V) is the same as the 
von Neumann entropy S(p). Note also that the .fT-blocked version Ak of A has a density 
matrix px = p- The AEP states that for sufficiently large K, there exists a set of 
2K(s+6) e ig en values of px with a sum of eigenvalues greater than 1 — e/2. Therefore, the 
sum of the 2 K ^ S+ ^ largest eigenvalues must be larger than 1 — e/2. By Lemma 1, there exists 
a coding scheme for Ak which uses K(S + S) qubits per signal for Ak and has distortion 
D < e. 

(b) Using the weak law of large numbers, it can be shown that, for all sufficiently large 
K, any subset of Vk of size less than 2 K ^ S ~^ has probability less than e. (See Ref. ||.) In 
particular, the sum of the 2 K ^ S ~^ largest eigenvalues will still be less than e. By Lemma 2, 
we find that for all sufficiently large K, any coding scheme with K(S — 5) qubits per signal 
will have distortion D > ^p^Trll? — 2rj. 

This completes our proof of the noiseless coding theorem for mixed states. 

Note that this theorem applies only to approximate transposition coding schemes. Is it 
possible to devise a more efficient coding scheme? The anwer is yes [0 . Mixed state signals 
might be re-constituted from a compressed version by adjoining an ancilla in a standard state, 
and applying a measurement process. Suppose, for instance, that there are two signals p\ 
and p2 with probabilities p\ and pi respectively, and that these signals live in a 4-dimensional 
space with supports in two 2-dimensional subspaces, which are orthogonal to each other. We 
can compress the data as follows: Measure the signal. Since the two signals have orthogonal 
supports, the measurement tells us with certainty which of the two signals we are given. 
Record the possible outcomes of our measurement by pure orthogonal (i.e. classical) states 
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|1) and |2) occurring with probabilities p± and p 2 - It follows from the Shannon's classical 
noiseless coding theorem that the signal can further be compressed to the Shannon entropy 
H(pi,p 2 ) qubits/signal. To reconstitute the signals, we simply decode (and decompose) the 
classical signal and represent each of the binary digit or 1 in the resulting sequence by a 
density matrice pi or p 2 accordingly. But H(pi,p 2 ) is less than S(pipi + P2P2) (if either p 1 
or P2 is a mixed state). The limit set by the mixed state coding theorem is, thus, beaten by 
the above method. A natural question to ask is: what is the information theoretic limit of 
the compression rate of mixed-state signals that no coding scheme can surpass? 

Another point to note is that Shannon's more important results deal with channels with 
noise. The information capacity of a noisy channel deserves further investigations. 

After the completion of this work we learned that Jozsa || has proven essentially the 
same result, using Uhlmann's transition probability formula as a fidelity function |10 
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